speech recognition milestone
A New Real-Time AI Platform from Microsoft, and a Speech Recognition Milestone
Earlier this week, our research team reached that 5.1 percent error rate with our speech recognition system – a new industry milestone that substantially surpasses the accuracy we achieved last year. We reduced our error rate by 12 percent from last year's level, using improvements to our neural net-based acoustic and language models. We introduced an additional convolutional neural network combined with bidirectional long-short-term memory (CNN-BLSTM) model for improved acoustic modeling. Additionally, our approach to combine predictions from multiple acoustic models now does so at both the frame/senone and word levels. We published a technical report that has the full system details.
Microsoft hits a speech recognition milestone with a system just as good as human ears
It's a red-letter day at Microsoft Research: a team working on speech recognition has hit a serious symbolic goal with a system that's as good as you at hearing what people are saying. Specifically, the system has a "word error rate" of 5.9 percent, on par with professional human transcribers. Even they don't hear things perfectly, of course, but 94 percent accuracy is more than good enough for conversation. "This accomplishment is the culmination of over twenty years of effort," said Geoffrey Zweig, one of the researchers, in a Microsoft blog post. Indeed, speech recognition is one of those tasks that's been pursued for decades by pretty much every major tech business and research outfit.
Microsoft researchers achieve speech recognition milestone - Next at Microsoft
Microsoft researchers have reached a milestone in the quest for computers to understand speech as well as humans. Xuedong Huang, the company's chief speech scientist, reports that in a recent benchmark evaluation against the industry standard Switchboard speech recognition task, Microsoft researchers achieved a word error rate (WER) of 6.3 percent, the lowest in the industry. In a research paper published Tuesday, the scientists said: "Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set. We believe this is the best performance reported to date for a recognition system not based on system combination. This past weekend, at Interspeech, an international conference on speech communication and technology held in San Francisco, IBM said it has achieved a WER of 6.6 percent. Twenty years ago, the error rate of the best published research system had a WER of greater than 43 percent. "This new milestone benefited from a wide range of new technologies ...